NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

AI’s challenge of understanding the world

https://doi.org/10.1126/science.adm8175

Mitchell, Melanie (November 2023, Science)

Full Text Available
How do we know how smart AI systems are?

https://doi.org/10.1126/science.adj5957

Mitchell, Melanie (July 2023, Science)

In 1967, Marvin Minksy, a founder of the field of artificial intelligence (AI), made a bold prediction: “Within a generation…the problem of creating ‘artificial intelligence’ will be substantially solved.” Assuming that a generation is about 30 years, Minsky was clearly overoptimistic. But now, nearly two generations later, how close are we to the original goal of human-level (or greater) intelligence in machines?
more » « less
Full Text Available
The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain

Moskvichev, Arsenii Kirillovich; Odouard, Victor Vikram; Mitchell, Melanie (August 2023, Transactions on machine learning research)

The abilities to form and abstract concepts is key to human intelligence, but such abilities remain lacking in state-of-the-art AI systems. There has been substantial research on conceptual abstraction in AI, particularly using idealized domains such as Raven's Progressive Matrices and Bongard problems, but even when AI systems succeed on such problems, the systems are rarely evaluated in depth to see if they have actually grasped the concepts they are meant to capture. In this paper we describe an in-depth evaluation benchmark for the Abstraction and Reasoning Corpus (ARC), a collection of few-shot abstraction and analogy problems developed by Chollet [2019]. In particular, we describe ConceptARC, a new, publicly available benchmark in the ARC domain that systematically assesses abstraction and generalization abilities on a number of basic spatial and semantic concepts. ConceptARC differs from the original ARC dataset in that it is specifically organized around "concept groups" -- sets of problems that focus on specific concepts and that are vary in complexity and level of abstraction. We report results on testing humans on this benchmark as well as three machine solvers: the top two programs from a 2021 ARC competition and OpenAI's GPT-4. Our results show that humans substantially outperform the machine solvers on this benchmark, showing abilities to abstract and generalize concepts that are not yet captured by AI systems. We believe that this benchmark will spur improvements in the development of AI systems for conceptual abstraction and in the effective evaluation of such systems.
more » « less
Full Text Available
The debate over understanding in AI’s large language models

https://doi.org/10.1073/pnas.2215907120

Mitchell, Melanie; Krakauer, David C. (March 2023, Proceedings of the National Academy of Sciences)

We survey a current, heated debate in the artificial intelligence (AI) research community on whether large pretrained language models can be said to understand language—and the physical and social situations language encodes—in any humanlike sense. We describe arguments that have been made for and against such understanding and key questions for the broader sciences of intelligence that have arisen in light of these arguments. We contend that an extended science of intelligence can be developed that will provide insight into distinct modes of understanding, their strengths and limitations, and the challenge of integrating diverse forms of cognition.
more » « less
Full Text Available
Evaluating Understanding on Conceptual Abstraction Benchmarks

https://doi.org/10.48550/arXiv.2206.14187

Odouard, Victor V.; Mitchell, Melanie. (June 2022, Proceedings of the AI Evaluation Beyond Metrics Workshop (IJCAI-2022))

A long-held objective in AI is to build systems that understand concepts in a humanlike way. Setting aside the difficulty of building such a system, even trying to evaluate one is a challenge, due to present-day AI's relative opacity and its proclivity for finding shortcut solutions. This is exacerbated by humans' tendency to anthropomorphize, assuming that a system that can recognize one instance of a concept must also understand other instances, as a human would. In this paper, we argue that understanding a concept requires the ability to use it in varied contexts. Accordingly, we propose systematic evaluations centered around concepts, by probing a system's ability to use a given concept in many different instantiations. We present case studies of such an evaluations on two domains -- RAVEN (inspired by Raven's Progressive Matrices) and the Abstraction and Reasoning Corpus (ARC) -- that have been used to develop and assess abstraction abilities in AI systems. Our concept-based approach to evaluation reveals information about AI systems that conventional test sets would have left hidden.
more » « less
Full Text Available
Rethink reporting of evaluation results in AI

https://doi.org/10.1126/science.adf6369

Burnell, Ryan; Schellaert, Wout; Burden, John; Ullman, Tomer D.; Martinez-Plumed, Fernando; Tenenbaum, Joshua B.; Rutar, Danaja; Cheke, Lucy G.; Sohl-Dickstein, Jascha; Mitchell, Melanie; et al (April 2023, Science)

Artificial intelligence (AI) systems have begun to be deployed in high-stakes contexts, including autonomous driving and medical diagnosis. In contexts such as these, the consequences of system failures can be devastating. It is therefore vital that researchers and policy-makers have a full understanding of the capabilities and weaknesses of AI systems so that they can make informed decisions about where these systems are safe to use and how they might be improved. Unfortunately, current approaches to AI evaluation make it exceedingly difficult to build such an understanding, for two key reasons. First, aggregate metrics make it hard to predict how a system will perform in a particular situation. Second, the instance-by-instance evaluation results that could be used to unpack these aggregate metrics are rarely made available ( 1 ). Here, we propose a path forward in which results are presented in more nuanced ways and instance-by-instance evaluation results are made publicly available.
more » « less
Full Text Available
Abstraction and analogy‐making in artificial intelligence

https://doi.org/10.1111/nyas.14619

Mitchell, Melanie (June 2021, Annals of the New York Academy of Sciences)

Abstract Conceptual abstraction and analogy‐making are key abilities underlying humans' abilities to learn, reason, and robustly adapt their knowledge to new domains. Despite a long history of research on constructing artificial intelligence (AI) systems with these abilities, no current AI system is anywhere close to a capability of forming humanlike abstractions or analogies. This paper reviews the advantages and limitations of several approaches toward this goal, including symbolic methods, deep learning, and probabilistic program induction. The paper concludes with several proposals for designing challenge tasks and evaluation measures in order to make quantifiable and generalizable progress in this area.
more » « less
Pyrene-Functionalized Fluorescent Nanojars: Synthesis, Mass Spectrometric, and Photophysical Studies

https://doi.org/10.1021/acsomega.1c05619

Mitchell, Melanie M.; Liyana Gunawardana, Vageesha W.; Ramakrishna, Guda; Mezei, Gellert (November 2021, ACS Omega)

Search for: All records